Model Based Reinforcement Learning with Final Time Horizon Optimization

نویسندگان

Wei Sun

Evangelos Theodorou

Panagiotis Tsiotras

چکیده

We present one of the first algorithms on model based reinforcement learning and trajectory optimization with free final time horizon. Grounded on the optimal control theory and Dynamic Programming, we derive a set of backward differential equations that propagate the value function and provide the optimal control policy and the optimal time horizon. The resulting policy generalizes previous results in model based trajectory optimization. Our analysis shows that the proposed algorithm recovers the theoretical optimal solution on linear low dimensional problem. Finally we provide application results on nonlinear systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cycle Time Optimization of Processes Using an Entropy-Based Learning for Task Allocation

Cycle time optimization could be one of the great challenges in business process management. Although there is much research on this subject, task similarities have been paid little attention. In this paper, a new approach is proposed to optimize cycle time by minimizing entropy of work lists in resource allocation while keeping workloads balanced. The idea of the entropy of work lists comes fr...

متن کامل

Reinforcement Learning with Time

This paper steps back from the standard infinite horizon formulation of reinforcement learning problems to consider the simpler case of finite horizon problems. Although finite horizon problems may be solved using infinite horizon learning algorithms by recasting the problem as an infinite horizon problem over a state space extended to include time, we show that such an application of infinite ...

متن کامل

Finite Horizon Learning

Incorporating adaptive learning into macroeconomics requires assumptions about how agents incorporate their forecasts into their decision-making. We develop a theory of bounded rationality that we call finite-horizon learning. This approach generalizes the two existing benchmarks in the literature: Eulerequation learning, which assumes that consumption decisions are made to satisfy the one-step...

متن کامل

Low-Area/Low-Power CMOS Op-Amps Design Based on Total Optimality Index Using Reinforcement Learning Approach

This paper presents the application of reinforcement learning in automatic analog IC design. In this work, the Multi-Objective approach by Learning Automata is evaluated for accommodating required functionalities and performance specifications considering optimal minimizing of MOSFETs area and power consumption for two famous CMOS op-amps. The results show the ability of the proposed method to ...

متن کامل

The Importance of Clipping in Neurocontrol by Direct Gradient Descent on the Cost-to-Go Function and in Adaptive Dynamic Programming

In adaptive dynamic programming, neurocontrol and reinforcement learning, the objective is for an agent to learn to choose actions so as to minimise a total cost function. In this paper we show that when discretized time is used to model the motion of the agent, it can be very important to do “clipping” on the motion of the agent in the final time step of the trajectory. By clipping we mean tha...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1509.01186 شماره

صفحات -

تاریخ انتشار 2015

Model Based Reinforcement Learning with Final Time Horizon Optimization

نویسندگان

چکیده

منابع مشابه

Cycle Time Optimization of Processes Using an Entropy-Based Learning for Task Allocation

Reinforcement Learning with Time

Finite Horizon Learning

Low-Area/Low-Power CMOS Op-Amps Design Based on Total Optimality Index Using Reinforcement Learning Approach

The Importance of Clipping in Neurocontrol by Direct Gradient Descent on the Cost-to-Go Function and in Adaptive Dynamic Programming

عنوان ژورنال:

اشتراک گذاری